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AN ANALYTIC SCALE OF HANDWRITING 


C. C. LISTER AND G. C. MYERS 





Brooklyn Training School for Teachers 


In 1910 a new epoch in educational measurement began, when 
Thorndike presented the first scale for measuring school products.* 
It was a writing scale. This scale was so carefully constructed 
that it has served as a model for practically all the scales appear- 
ing since that time. He had one thousand specimens “ranging 
from the best to the worst handwriting found in grades 5 to 8” 
arranged by from twenty-three to fifty-five competent, judges, each 
grading the set into groups 1 to 11 by what he considered equal 
steps. 

Two years later Leonard P. Ayres} constructed a second scale. 
This scale was made from 1578 samples of the handwriting of chil- 
dren of the upper elementary grades of forty school systems in thirty- 
eight states. Ten paid investigators made accurately timed read- 
ings of each sample. The legibility was considered the measure 
of quality. — 2 

These two very carefully constructed scales have been widely 
used for measuring handwriting in the public schools. In spite of 
their many merits they have certain defects, and do not wholly 
meet the needs of the schools. Some of these defects have been 
pointed out by Freeman.** 

The chief criticism from the viewpoint of the practical school 
teacher is that while these scales are more precise in measuring 
writing product than hit-or-miss judgment without a scale, neither 


*Handwriting. Teachers College Record, March, 1910. 

+A Scale for Measuring the Quality of Handwriting of School Children. 16 pp., 1912. 
Division of Education Russell Sage Foundation. 

**An Analytic Scale for Handwriting. Elementary School Journal. 15, 1915: 


432-441. 
(417) 








PE es tne 


ee Ce Fe Or eee 














SR ESS RABE 





etn eee 


pee Bie 3 PCR BS 
& 56,3 


ii taba i le a EES ahi NL SASS alae 


= 
en 








418 THE JOURNAL OF EDUCATIONAL PSYCHOLOGY 


helps the teacher to analyze the child’s writing defects. For Ayres, 
“general quality” is the only guide, and the assumption is that the 
general quality is measured by speed of reading, or legibility. Thorn- 
dike aims to judge three factors combined, legibility, beauty and 
character. After offering a number of criticisms on his own work 
he says (p. 13-14): “‘A far more sagacious criticism than either of 
these would be that a scale like this for merit in general is less useful 
than a scale for legibility alone, or for beauty alone or for character 
alone, or for ease alone. Of course, I admit that such specialized 
scales are highly desirable, and I hope that this scale for general 
merit will stimulate others to the labor of making similar scales for 
legibility alone, beauty alone, and so on. But it seems sure that 
the scale of most importance and usefulness is that for general merit.”’ 

Now suppose one had such specialized scales and were to rate 
Willie’s writing thereby so as to find it 8 in beauty, 16 in legibility 
and 12 in ease; one would have to say, “‘ Willie, your writing is 
quite legible but you must make it more beautiful.”” From the 
viewpoint of*any of the leading systems of teaching penmanship 
this diagnosis would be of little value to the teacher or the child. 
On the other hand there are certain specific essentials which are 
looked for by leading penmen. Freeman suggests that the chief 
ones are uniformity of slant, uniformity of alignment, quality of 
line, letter formation and spacing. Accordingly he set out to con- 
struct a scale from these five viewpoints. 

In making his scale Freeman constructed a preliminary chart 
presumably of artificial specimens, to illustrate different degrees in 
these traits. ‘“‘This was used as a guide by a class in experimental 
education composed of advanced students, most of whom were ex- 
perienced in supervision and teaching, upon which to rank a large 
number of specimens into ten degrees of excellence in each trait. 
After the specimens had been so rated, however, it appears to the 
writer that the order was not always the correct one.” Therefore 
Freeman proceeded to “doctor” the results, and as objective meas- 
ures of slant and of alignment “it was only necessary to measure 
the angles of a series of letters and to find the mean variation among 
these angles, or to measure the vertical positions of the tops and 
bottoms of the letters and to measure the variability among those 
positions. When this was done it was found that the order based 
on variability as measured did not correspond to the order on the 
basis of the judgment made by the grades. The order based on 
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objective measurement was therefore used as a basis for the selec- 
tion of specimens for the scale.” ‘‘In the case of quality of line 
no means was found upon which to base such objective measure- 
ment; but the characteristic in question was made more prominent 
by photographic enlargement. When this was done it was rela- 
tively easy to determine differences in the irregularity of the line 
of the writing. On this basis specimens for this chart were select- 
ed. >? 

In the case of letter formation the assistance of a Mr. S. “who 
had developed a system of determining excellence in letter formation 
by the method of counting the errors in form” was called to note 
“the errors in the specimens of the scale which had been selected. 
In some cases the results of this measurement differed from the re- 
sults of the judgment of the graders, and a compromise between the 
two methods of determining was used in selecting the specimens for 
the scale.” 

Of spacing he says: “Various specimens were constructed in 
which the spacing between letters was different, and the judges 
asked to select that specimen in which the letters were the most 
agreeably spaced” . . . . “Then the lower grades were 
constructed by varying this spacing in a variety of ways.” For 
the scale, specimens of writing that compared most closely with 
these standards were chosen. 

“Each chart contains specimens of writing which represents 
three grades of excellence in the characteristics in question.” The 
lowest is valued 1; middle, 3; highest, 5; and double weight given 
to each rank in letter formation. 

Certainly Freeman set out to meet a definite need; but from the 
description of his mode of procedure there is evidence of consider- 
able looseness. For example, “class in experimental education,’ 
“a large number of specimens,’’ “‘a series of letters,” “various 
specimens,”’ and “‘varying this spacing in a variety of ways” are 
not terms suggestive of scientific accuracy. Apparently the pre- 
liminary chart was artificial. So far as judges were concerned this 
preliminary chart was really the scale. There is no evidence to 
show how this was constructed. 

For slant and alignment, the opinions of the judges were arbitrar- 
ily thrown out of court and instead “‘a series of letters”” one knows 
not how many or whether letters of children or of adult were ac- 
tually mechanically measured in respect to regularity of slant and 
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alignment. This type of measure is perhaps in itself not without 
merit; but the shifting of standards is of questionable propriety. 

For this as in succeeding traits Freeman notes that ‘‘On this 
basis specimens for the chart were selected.’”’ Just who selected 
the final specimens is not always clear. In the case of letter forma- 
tion it seems that a Mr. S. was the final arbiter. Finally no state- 
ment is made as to how the three specimens used on the scale were 
selected from the “‘ten degrees of excellence”’ rated by the “‘class.”’ 

Assuming that Freeman’s scale were sufficiently scientific, it does 
not quite meet the needs which Freeman so clearly has pointed out. 
Three degrees of merit can hardly suffice in diagnosing writing with 
great precision, and in grading as accurately as most teachers are 
called upon to grade. 

Furthermore it appears that no one has attempted to construct 
a scale from a product of any one recognized school system of pen- 
manship. Indeed, Thorndike criticizes his own scale because it 
does not represent all styles of writing. Ayres, however, has ar- 
ranged his specimens on the scale in three rows illustrating slant, 
medium and vertical writing. Apparently the specimens, as pre- 
sented to his “readers,’’ were all mixed and those on the scale 
merely selected from the general results. A few words from Ballou 
in his study of Boston’s spelling are to the point. 

“In conclusion, the writer urges the fundamental importance in 
the standardization of any educational product, or in the evalua- 
tion of questions in a standard test for any subject, of knowing 
that the pupils have received instruction in the field of knowledge 
covered by the scale on the test. Unless scales on the standards 
derived from standard tests are based on the results achieved by 
children after proper instruction, then standards established on the 
basis of the results from such tests are unsatisfactory as measures 
of instruction. That Boston pupils spell much better than pupils 
in the 84 cities is evident from evidence here submitted. To the 
writer the reason for this superiority appears to be the fact that 
Boston pupils have received instruction in the words under con- 
sideration; whereas, the standards in the Ayres’ scale are based 
on results obtained from pupils who may never have been taught 
to spell the words in the scale, To assume that the scale establishes 
proper standards by which to measure the results of spelling instruc- 
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tion is to be satisfied with less ability among pupils than may reas- 
onably be expected if pupils have been properly instructed.’’* 

To incorporate the leading merits of the writing scales noted 
above, and to supply some of their defects, the writers have attempted 
to construct a scale. In 1914 a uniform style of letters to be taught 
in New York City was adopted by the Board of Education, and teach- 
ing of the muscular movement method of writing in the city schools 
was authorized. Under these favorable conditions the writers, at 
the invitation of Acting Superintendent Straubenmuller, proceeded 
to construct a writing scale for New York City. The following 
which briefly states the mode of procedure appears at the bottom 
of this scale: 

HOW THE SCALE WAS MADE 


The scale represents the average judgment of 21 teachers and 
penmen expert in the muscular system of penmanship, and 4 psy- 
chologists. From 9 schools of Greater New York, representative 
of the best, medium and poorest product of the muscular system of 
penmanship 3550 specimens were selected from at least one entire 
class of each grade from 3B to 8B inclusive. Each specimen repre- 
sented one trial from dictation. 

The specimens from each grade were classified into four piles by 
the writers on the basis of general merit and each pile was thorough- 
ly mixed with its corresponding pile of the several grades. Then 
on the basis of chance three hundred specimens were selected so 
that practically the same number was drawn from each pile. Ac- 
cording to written instructions each of the 25 judges ranked these 
specimens in 8 piles on the basis of equal intervals in merit. Ac- 
cordingly each judge ranked the 300 specimens three times, namely, 
as to form, spacing and movement. 

On the basis of the average rank assigned each specimen the best 
and the poorest were selected as the top and the bottom of the scale. 
Therefrom the exact numerical rank which the other six samples 
should have was determined. The specimens whose average ranks 
are the same as these determined positions, or are nearest them 
were selected. Without exception all the samples on the scale 
are less than .1 from the determined position. 

In addition it should be noted that one of the writers (C. C. L.) 
is personally familiar with the general nature and quality of the 
writing in practically every elementary public school of the city; 





*FRANK W. BALLOU. Measuring Boston’s Spelling Ability by the Ayres Spelling 
Scale. School and Society 5: 1917, 720. 
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that the writing was selected from 3 schools of Manhattan, from 3 
of Brooklyn and from one of each of the other three boroughs and 
that at the outset of selecting the 300 specimens, about a score of 
papers which were blotted or were too dim for photo-engraving, 
were discarded. 

Here is a sample of the directions given: 


INSTRUCTIONS TO JUDGES 

These specimens will be handled by about forty judges. Therefore you will 
please handle them as carefully as possible. 

Please make all judgments with perfect independence. Any suggestions from 
another judge will render the work unreliable. 

Firsi. Arrange the 300 specimens in eight piles according to their order of merit 
in movement, so that the intervals between the piles will be as nearly equal as pos- 
sible. The basis of judgment as to movement will be the quality of the lines. Heavy 
tremulous lines indicate poor movement. Frequent lifting of the pen between the 
letters in a word indicates poor movement. Sharp, smooth, clear-cut lines indicate 
good movement. 

As soon as this is done, copy the numbers placed on the specimens, on the blank 
sheet labeled Movement. The numbers found on the papers in the pile of best spec- 
imens should be grouped after One; those on the papers in the second pile should 
be grouped after Two, etc. 

Second. Mix the specimens (take one from each pile and put it on a new pile 
till all are collected) and then arrange them in eight piles according to their merit 
in form. Form includes accuracy in letter formation, uniformity in size and uni- 
formity in slant. 

When the papers have been classified as to form, record the numbers on the blank 
sheet labeled Form, as directed above for recording movement. 

Third. Again mix the specimens and arrange them in eight piles according to 
their merit in spacing. In judging spacing consider correctness and regularity of 
distance between letters and words; spacings that are too wide, too close or irregular 
are undesirable. 

When the samples are arranged as directed tabulate the numbers on the sheet 
Spacing, after the manner indicated above. 

Be sure to sign your name to each record sheet. 

Your co-operation will be appreciated. 

Three types of these directions were provided in which the essen- 
tials appeared, movement—form—spacing, form—movement— 
spacing, spacing—form—movement orders respectively. Each di- 
rection sheet was used about as often as every other, though un- 
fortunately exact distribution was not recorded. Several who gave 
introspections noted that judgment from one point of view did not 
affect judgment from other points of view, yet there was doubtless 
some effect. Whether it would have been better to have had 3 
sets of judges instead of the one no one knows. This, however, 
should be determined experimentally. 
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Before each judge began his work these directions were read 
over with him to make sure he was certain of how to proceed. He 
was especially urged to strive to make the steps between the piles 
equal. The judges were told before they volunteered that the task 
would take from 4 to 8 hours, and the purpose of the study was 
explained to them. While two reported that they did it in a “‘sit- 
ting’”’ most of the rest noted that they did only one ranking at a 
time. 

The data indicating the determined position which each speci- 
men on the scale should have for movement, form, and spacing, 


TABLE I 
Form 
Probable average divergence of 
the estimated quality from an 


Sample Sample Desired Real estimate by an infinite number 
Rank Number Value Value of judges. 


1 soe «1.20 1.20 .06 
2 135" 2.152 2.17 15 
287 2.12 12 

150 2.20 .21 

3 3* 3.104 3.12 14 
38 3.17 14 

82 3.17 20 

4 167* 4.056 4.04 15 
218 4.04 .21 

11 4.08 17 

16 — 13 

5 109* 5.008 5.00 at 
94 5.00 15 

281 5.00 20 

6 225* 5.960 5.96 14 
160 5.96 15 

162 5.95 15 

290 5.95 15 

7 222* 6.912 7.00 09 
226 7.00 10 

8 se 7.87 7.87 .04 
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respectively, are given below; also the samples whose actual rank- 
ings equal or approximate the desired rankings. Those starred are 
the samples selected for the scale. Furthermore, the probable 
average divergence of the concensus of our 25 judges from the con- 
census of opinion of thousands of such judges is indicated below, 
along with similar data from Thorndike. 


TABLE II 
Movement 
Probable average divergence of 


the estimated quality from an 
Sample Sample Desired Real estimate by an infinite number 


Rank Number Value Value of judges. 

1 36* 1.12 1.12 04 
42 1.12 04 

30 1.12 .04 

2 92* 2.084 2.00 18 
291 1.96 15 

3 43* 3.048 3.08 15 
130 3.08 .25 

4 111* 4.012 4.00 .14 
122 4.00 .15 

275 4.00 .19 

192 4.04 15 

211 4.04 15 

248 4.04 .14 

5 294* 4.976 4.96 13 
109 4.96 .19 

117 5.08 .20 

178 5.08 15 

6 281* 5.940 5.92 .16 
159 5.92 18 

84 5.92 24 

289 5.96 .16 

73 5.96 17 

68 5.96 .18 

7 226* 6.904 6.92 ll 
217 6.92 .12 

134 6.92 ll 

8 116* 7.87 7.87 04 
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Sample 
Number 
37* 


46* 
9 
176 


12* 
129 
. 107 
192 


20* 
194 
131 
218 

23 
140 
171 
286 


76* 
203 
155 
250 
206 
273 
184 
178 
259 


282* 
295 
115* 
105 


182* 


Desired 
Value 
1.25 


2.214 


3.178 


4.142 


5.106 


6.070 


7.03 


8.00 


TABLE III 


Spacing 


Real 
Value 


1.25 


2.12 
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Probable average divergence of 
the estimated quality from an 
estimate by an infinite number 
of judges. 


.07 


.16 
15 
16 


14 
.16 
17 
16 


15 
13 
24 
.29 
17 
24 
17 
17 


.16 
18 
21 
17 
.14 
15 
.24 
24 
21 


.18 
18 
21 


.16 
12 
.10 
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Sample 


32 


1S GRFewn FS SBE 


8&8 


17 


21 


31 


14 


126 


Thorndike’s Data (p. 11 Handwriting) 


Quality 


16.1 
16.2 


TABLE IV 
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Probable average divergence of the estimated 
quality from an estimate by an indefinite number 
of judges 
.14 
.43 


BBE S 


18 


15 
.15 
.14 


.14 
.19 
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As it happened, several specimens received the same average 
rank. In that event the specimen upon which the judges most 
closely agreed (indicated by the probable average divergence) was 
selected. There is one exception to this rule: for position number 
four under Spacing, the specimen from the pair equally nearest the 
true position, which was selected had the larger probable diverg- 
ence. The other, however, had been slightly crumpled from hand- 
ling, whereby it was rendered poor for photo-engraving. When, 
as in a few cases, the probable divergence as well as the average 
position was the same for several specimens (e.g. numbers 1 and 7 
of Movement), photo-engraving merit determined the choice of the 
specimen for the scale. 

It is rather remarkable that so many specimens should come to 
have the same rank. Indeed, those specimens with practically any 
one of the numbers listed under each of the eight positions could 
have been taken as scale samples without serious departure from 
the truth. The fact that the same system of writing was employed 
by those writing the specimens may suggest why so many samples 
were ranked on the average, the same. Mere inspection of the pro- 
duct of the muscular system of penmanship indicates that individuals 
write much more like one another than they would write, probably, 
were the system used not uniform. 

The closeness of the selected specimens to the determined posi- 
tions (less than .1) is noted in the data printed on the scale. On 
this point Thorndike says (p. 10) “As was noted, on page 3, the scale 
is only approximate. 16 on the scale does not pretend to mean 
16.00000, but between 15.9 and 16.1. 8 does not pretend to mean 
8.0, but between 7.9 and 8.1, and as a matter of fact, although I 
have had a thousand samples graded and have chosen as wisely as 
I could, some of the samples do vary in merit from 7, 8, 9, 10, etc., 
by more than .1 plus or minus.”’ (See data from Thorndike below.) 
Thorndike also points out that a new but small varying element 
is introduced by the process of photo-engraving and printing. 

Although these data are not easily comparable with those of 
Thorndike since he used more degrees in ranking, had more speci- 
mens and more judges, they are considerably lower and suggest as 
great, if not greater, reliability than do his. There are certain reas- 
ons why for the same number of judges our data should admit of 
slightly greater reliability: the judges looked for specific things; 
nearly all were expert penmen in the style of writing judged; the 
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FORM 


Form includes accuracy of letter formation, uniformity of slant and size. 
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ceaeta abl purple of refemement 


The above model shows the style of writing taught and illustrates ideal letter formation, size and slant. 
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This is good elementary school penmanship. Note the uniformity of size, slant, and alinement. The x is crossed carelessly. 





This writing is too emall; but note the uniformity of slant and size. It is too angular—see m, n, and h. 





Good form. The curvature between letters is slightly exaggerated. “Note the approach to a, d, and m. 
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This writing is too angular, especially m and n. It slants too much. Note the careless form of k. 





CFs, caret poanterte rtm | 
eee ee er a dete | 


This writing slants too much. The t should not be looped. Note the careless tendency in completing k. 





40 


pt enn CAL 


rregular. The r's look like and the s PMR 2S t the top. The x is crossed carelessly. The f is poor. 





Ge omar ome 


This writing slants a plea a haa, a Real and d are poor. 
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gy 9 letter formation, irregular size, and irregular slant. 
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specimens were all in one style of writing and the same subject mat- 


ter was involved. On the other hand Thorndike had more judges 
than we had and they judged 1000 instead of 300 specimens. Our 
specimens, however, were chosen by lot from 3550. 

More judges would have reduced the probable error but the speci- 
mens would have been endangered by wear and slight crumpling. 
Future makers of such scales should have specimens written on 
cardboard or on paper mounted in some way. 

At the top of the scale under movement are the standard letters 
as acopted by New York City, and, at the top of form and spacing 


columns respectively are model specimens from a copy book, ar- 


bitrarily placed there to illustrate the ideal toward which the teach- 
er and school should work. 

It will be noted that the writers have paid no attention to the 
zero point on the scale. Thorndike strongly emphasized this and 
Ayres also. Likewise scale makers in other subjects have worried 
over it. The attempt by them of course, has been to make a scale 
analogous to physical measures. This attempt doubtless was of 
rare value in developing confidence in scales when the first scale 
was born; but for practical purposes the writers agree with H. T. 
Manuel,* that ‘“‘elaborate effort to establish a zero point is quite 
unnecessary,”’ and that “‘any two samples may be taken as fixed 
points. ”’ 

We have attempted merely to show that the specimens selected 
as the eighth and first specimens on our scale are those selected as 
the highest and lowest from our specimens, and nothing more. 

According to the probable average divergence, the indications 
are that the steps for the several essentials between the first and 
second position, and between the seventh and eighth position, are 
greater than the other intervals on the scale. Especially does this 
seem true with the eighth specimen of spacing, which was ranked 
in that position by every judge. Indeed a number of the judges 
stated that this specimen for spacing “‘spoiled the equal steps.” 
Had the specimens to be judged been selected from a much larger 
number this difficulty might have been slightly decreased. In 
the light of the findings of a number of investigators using the 
ranking method, however, irregularity of intervals is practically 
inevitable, especially for the extremes. For practical purposes 


- *The Use of an Objective Scale for Grading Handwriting. Elementary School Journal, 
January, 1915. 


Copies of the complete scale may be procured from The Macmillan Company, New York, at 25c each 
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this fact is of little importance in respect to the function of the scale. 
When, however, arbitrary relative values are assigned to the steps 
on the scale, as the writers have done, the error is slightly exagger- 
ated. It will be noted that the sample for the lowest position is 
assigned a value indicating it to be more than a “step” below 
number 7. Of the shortcomings of these arbitrary values the writ- 
ers are aware, but these values are merely offered as suggestions to 
the many teachers who insist upon having some such guides. Cer- 
tainly, wherever the administrator permits, all grading of the child’s 
writing would better be done in terms of mere position on the scale. 

On the scale appear simple concrete directions for its use. Under 
each specimen are a few suggestive remarks analyzing that spec- 
imen. Freeman gives grades 1, 3 and 5 respectively to each of 
the three points on his scale for each essential except in letter for- 
mation, where double weight is assigned each rank. To arrive at 
relative weights one could determine as above, a general merit 
scale and thereby ascertain to what degree movement, form and 
spacing respectively correlate with this general quality. 

With the same three hundred specimens which are used in this 
scale, but properly mounted for protection, the writers hope later 
to make such a study. The results thereby obtained would not 
wholly suffice as relative weights for all grades alike, since the writ- 
ing product per se is hardly the equally desirable objective in all 
the grades. From the writer’s viewpoint, at least, the first few 
grades should emphasize the ground work of muscular movement 
penmanship and therefore put relatively greatest emphasis upon 
movement. One obvious difficulty with this obtains; the child as 
well as the adult likes to get practical results. If, however, the 
proper standards are set for the child and teacher any desirable 
objective could be made to become “‘practical.’’ In the absence 
of sufficient data the writers would recommend further that equal 
weights be assigned the three essentials in the fifth and sixth grades 
and that more weight be given to form and spacing and less to 
movement in the seventh and eighth years. This scale is of doubt- 
ful value below the 4th grade. 

To determine how the writing of the various grades is distributed 
on our scale is another bit of unfinished business. 
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STANDARD GEOGRAPHY TEST—THE WORLD 
For Fifth Grades 


ERNEST C. WITHAM 
Southington, Conn. 


The geography test here described was worked out two years 
ago, and since then has been given in several parts of the country. 
Standard is used in connection with this series of tests to distinguish 
them from the old line of examinations. The author regrets ex- 
ceedingly that his administrative duties have prevented his carry- 
ing out some plans of standardization which he has had in mind 
regarding the development of his geography tests. Many requests 
for samples and information regarding the tests is the only excuse 
for offering this article at this time. 


Purpose of the Tests. 

The general purpose of these tests is to enable those responsible 
for the quality of instruction to measure the work done by the 
pupils and teachers in the subject of geography. These tests offer 
a reliable means of getting a large body of facts, and a record of 
geographical thinking in a minimum amount of time. They are 
also intended for wide-awake individual teachers, who are looking 
for school room helps. 


Directions for Giving ‘‘The World” Tests. 

For supervisory purposes it is best to have the same person give 
this test in all the different fifth grade rooms; but this is not absolute- 
ly necessary. Where several are to give the test, they should all 
be given general instructions beforehand, so as to make the con- 
ditions the same. 

This test should be given in the early part of the spring term to 
fifth grades. By this time the pupils should know the geography 
called for. If it turns out that the class as a whole is not able to 
make a good score on the test, it is not too late in the year to begin 
to remedy the defects. Individuals who are below standard should 
be given special attention. This test will be especially helpful in 
diagnosing their difficulties. Every pupil should be given a set of 
the test papers. A few general words of advise should be given 
to the pupils in regard to carefulness, neatness, and honesty. The 
pupils should be told not to hurry as they will be allowed all the time 
they need to complete the tests. (The usual time is about 25 min- 
utes for all to complete the fifth grade test.) The pupils should 
next fill in name, school, and room at the top of the page. 

(432) 
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THE WORLD—FOR FIFTH GRADES 


I. On the outline maps printed below find the following geographical divisions 
of the world, and write the name of each across the face of the map. 


1. The United States, Mexico and Central America. 
2. Greenland. 6. Great Britain (British Isles). 
3. Dominion of Canada. 7. Africa. 
4. South America. 8. Asia 
5. Alaska. 9. Australia. 
10. Europe. 


II. 
D, Weeee We Gee BOUTND Ge Cite GITEEEE « Bhi o oc o'c veces ccacccccdes chives 
2. What are the motions of the earth?.....................0055. 3 
III. On the maps below: 


1. Draw the Equator. 5. Write in their proper place the 
2. Indicate the North Pole. names of the following oceans: 
3. Indicate the South Pole. a. Atlantic. 
4. Write under each map b. Pacific. 
which hemisphere it is. c. Indian. 





¢ 


& | 
Se A 


VV 






































THE JOURNAL OF EDUCATIONAL PSYCHOLOGY 














ee 7 




















Pate ee a Fk Bee SEAS, aS ae a3 i ‘ 
eT EN EOS Fir Vi eR an besten “op aile sath 73 A tee ies eee ee SI a i con ae 
= a : a - - > , = = 


TS Se ie Seca Se — P: 2 > eer 
ance Se PMR erat aS PST Aart TE He erates SES TF gern eT igh ie pa tei I Pe ine ig ese 
page) 2iS0T 5 : ¥ = , Se 2 " 


ore 5 v 





es sa 5 





a 


See ST 28 
ee a ee ee onl ee 


STANDARD GEOGRAPHY TEST 435 


IV. Indicate on this map all of the following land and water forms: 


Land Forms Water Forms 
1. Island. 8. Lake. 
2. Mountains. 9. Gulf. 

3. Peninsula. 10. Bay. 

4. Cape. 11. River. 
5. Valley. 12. Ocean. 
6. Mainland. 13. Strait. 
7. Isthmus. 14. Harbor. 
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V. What continents are the homes of the following peoples? 
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VI. Name one great industry of each of these countries: 
1. United States 


ee | 
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VII. The five largest cities in the world are given in order of their size. 
In what countries are they located? 


2 
i 
ee 
ee 


“ee eeereeeereereereereerereeeeereereeereeeneeneeneeeeeeenenereaeeaeeee 


All pupils should now read aloud the first question. The examiner 
should illustrate just what is wanted. This can be done best by 
quickly outlining a rough map on the blackboard. Any one of 
the states will do. For example, Maine. The examiner will then 
say, “Now this is not one of the geographical divisions called for, 
but if it were, you would write the word Maine, like this, right 
across the map. Now turning to your paper, you will notice ten 
geographical divisions called for. Study them over carefully, and 
then write the name of each across the map, just as I have done in 
the case of the map of Maine, which I have roughly represented on 
the board. 

As fast as the pupils complete the first page have them tear it 
off and pass it in. Pupils should go ahead with the other questions 
as fast as they finish the first. They must be reminded to write 
their names on each page before beginning to answer the question. 
They should be told that there are two parts to the second question 
under section II. In section III the pupils should be told to write 
the names of the Atlantic and Pacific oceans on both maps. 
Directions for Scoring ‘The World”’ Test. 

First sort the papers having all of the first pages in one pile, the 
second pages in another pile, etc. Correct the first pile of papers, 
and mark in the rectangle in the upper right hand corner the number 
of answers attempted, and also the number of answers correct. 
Next correct the second pile of papers. In Section II, answers 
such as—sphere, like an orange or like a ball, are correct. Just the 
word, round, should be called wrong. There are three possible 
points in this section. 

Section III.—Ten possible points. 

Section IV.—Fourteen possible points. 

Section V.—Eight points. 
Answers are as follows: (The order of the answers does not matter.) 
Red 1. North America 2. South America 
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White 3. Europe 4. North America 5. South America 

6. Yellow—Asia—Also Japan and China is a correct answer. 

7. Black —Africa. 

8. Brown—Asia—also call East Indies or Philippines and Hawai- 
ian Islands correct answers. 

Section VI.—Five points. 

Call correct any of the following answers: 

1. United States—Farming, manufacturing, lumbering, mining, 

cotton, fruit growing, cattle raising, etc. 

2. Argentina—Cattle raising, wheat. 

3. Cariada—Lumber, wheat, furs. 

4. France—Manufacturing, iron, linen, silk, porcelain, glassware. 

5. Russia—Grain, or wheat, cattle, lumber. 

After the papers have all been corrected, take the first pile, and 
sort the papers into groups according to the number of answers 
attempted. For example, if there are 30 papers, place all with 
ten attempted in one pile, all with nine attempted in the next pile 
and all with 8 attempted in another pile, and so on. Record the 
number attempted on the Class Record Sheet. F means frequency. 
If there are 20 papers with ten attempts in section I, write 20 in 
column F, opposite 10, which is the score (Sc). If there are 5 
papers with nine attempts, write 5 opposite the 9. If there are 2 
with seven attempts mark a 2 opposite the 7. If there are 3 with 
six attempts write 3 opposite the 6. The second column is for the 
sum of the scores. In each case multiply the frequency by the score. 
Add the first and second column and record at the bottom. Divide 
the sum of the second column by the sum of the first. This will 
give the average number of points attempted in section I. 

Assemble the first pages and sort again according to the number 
of correct answers. Count the number in the several piles and re- 
cord on the right side of Section I of the Class Record Sheet, col- 
umn F. Proceed as before to get the sum and average. The re- 
mainder of the tabulations should be carried out in the same manner 
as in the case of Section I. 

Graph Sheet. To get a class graph make an X as near as pos- 
sible at the point on each section of the graph sheet which indicates 
the number of attempts. Connect these with a solid line. Sim- 
ilarly, construct a graph showing the number of rights, using a 
broken line. 

A few of the interesting replies to the test questions follow: 

Fighting was given by several pupils as a great industry in both 
France and Russia. 
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“Industries of Argentina are fruit and earthquakes.” 


One pupil located Paris in Germany and another located Chicago 


in Michigan. 


Only a few instances showed lack of ability to read the questions. 


The whole test was arranged with this point in view. 


Figures 1 and 2 show two of the class records, giving the distri- 
bution of results of the papers which were scored according to the 
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Explanations for Figures 1 and 2 
In the tables in Figures 1 and 2 the central column (Sc) indicates the score, the 
columns marked F indicate the frequency or number of pupils making that score, 
and those marked S show the totals, or the frequency times the score. The totals 
on the left give the number attempted, those on the right the number correct. 
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Figures 3, 4, and 5 are graphs of the results of the test in three 
fifth grade rooms. All the tests were given and scored by the same 
person. Figure 3 is the graph of the results as given in figure 1, 
and figure 4 is the graph of the results as given in figure 2. 
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There can be very little question as to which of these classes was 
getting the best teaching in geography. It is also equally clear 
which class was next best. Since the use of these tests I can safely 
say that the class where the poorest work was being done is no longer 
at the foot of the list. The test proved the right stimulus at the 
right time and the result has been good. This particular teacher 
has had years of experience, but believed that she must push the 
pupils through every page of the geography regardless of every- 
thing else. The test has helped wonderfully in bringing about a 
better method of teaching geography. 

Figure 6 shows the scores of three pupils in the school that made 
the best record. The number of right answers of the best, the me- 
dian and the poorest pupils in the class are graphically shown. 
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The pupils in the room showing the best results were arranged 
according to their rank in the standard test, and their relative po- 
sitions are shown by the numbers in the last column of figure 8 
In the first column of this same figure the pupils are arranged ac- 
cording to their rank on the results of test No. 20 “The World” 
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Thompson’s Minimum Essentials. In the second column the pupils 
are ranked according to the teacher’s marks in geography. So 
that beginning at the top of the figure the first pupil ranked 34 in 
the Minimum Essential test, 3 in the teacher’s marks and 15 in the 
standard geography test. The second pupil came 6 in the Essen- 
tial test, 30 in teacher’s marks and 20 in the standard test, etc. 
The connecting lines show graphically the relationship between 
these three sets of ratings. 
The Pearson coefficients of correlation for the results in the 
Essential test, teacher’s marks and standard test are as follows: 
Standard test and teacher’s marks = + .41 
Standard test and Minimum Essentials = + .58 
Teacher’s marks and Minimum Essentials = +- .57 
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A SCALE FOR MEASURING THE ABILITY OF 
CHILDREN IN GEOGRAPHY 


E. E. LACKEY 
State Normal School, Wayne, Nebraska 


The scale herein described is a product of the joint efforts of H. 
H. Hahn, Dean of the Department of Education, and E. E. Lackey 
of the Department of Geography, both of the State Normal School 
at Wayne, Nebraska. The latter assumes responsibility for the 
selection of questions, for what constitutes acceptable answers and 
the adequacy of the tests as a whole to fairly cover the field of ge- 
ography. The science of scale-making and the classification of the 
exercises needed in the construction of this scale is the contribution 
of the former. Although each assumes responsibility for the con- 
tribution from his distinctive field yet it was found that each could 
assist the other in almost every phase of the work. The following 
description of the derivation of the scale is the work of Dean Hahn. 


PURPOSE OF SCALE 


Since texts will be used by a large majority of teachers for years 
to come, our primary purpose was to construct a scale for the test- 
ing of the teaching of geography from text-books. But when we 
realized that not one but a number of texts are being taught, we 
had to modify our plan. Our first modification consisted of limiting 
our exercises to the phases of geography treated in common by six 
modern texts. Then we found that some of these phases were treat- 
ed more fully by some authors than they were by others. A second 
modification of our plan was, therefore, necessary; namely, to 
select the common subject matter, or, in other words, the essentials 
of subject matter in each phase. In the selection of the essentials 
of subject matter the common subject matter in these texts was 
largely our guide; but we also checked our exercises by principles 
and minimum essentials as they have been worked out by makers of 
geography curricula. (See 1914 and 1916 Year Books of the Na- 
tional Society for the Study of Education.) Over six hundred ques- 
tions and exercises were selected by three teachers, covering this 
common subject matter. These questions and exercises were theri 
examined by the authorsof thescale, first with reference to repetitions, 
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and duplications were eliminated. They were next examined for 
language difficulty. The wording of many of the exercises was 
changed, some of them were actually tried out on children, and in 
many cases technical expressions which would convey exact meaning 
to mature students of geography were eliminated and the ordinary 
language of children substituted. This is particularly true of the 
exercises in the lower reaches of the scale. The exercises intended 
for the upper reaches of the scale were not freed from technical ex- 
pressions the meaning of which pupils are expected to know as evi- 
dence of geography ability. Thus we find such expressions in some 
of the exercises of the scale as “‘the Fall Line,” “climate,” “con- 
tinent,”’ “natural wonders,”’ “natural geographic barriers,”’ “‘agen- 
cies,”’ “‘cyclonic storms,” and many others equally technical. The 
exercises were examined, in the third place, as to their scope, as 
suggested before. Nothing was included beyond the essentials of 
geography. Finally, the list of exercises was revised so that it 
contained about an equal number of memory and thought questions 
and exercises. 

The scale, therefore, does not test the teaching of any one text. 
It only tests the teaching of that subject matter which it has in 
common with the other five texts. But while it is not a complete 
test for the work done in any one text, it has the compensatory 
virtue of being a test, more or less, of the minimum essentials of 
geography so far as they are determined under the above limitations. 


THE PRELIMINARY TEST 


‘The exercises were mimeographed in sets of twenty-five each, with 
a place for the answer after each exercise. They were given to 
pupils in the fourth, fifth, sixth, seventh, and eighth grades. Their 
instructions were as follows: 

1. Write the answer to each exercise in the spaces directly follow- 
ing it. Some exercises call for two or more things. Be sure to give 
as many things as the exercise calls for. Write merely the answer. 
Do not use complete sentences. 

2. If you cannot answer a question at all, leave spaces for answer 
blank. If you cannot answer because you do not know what the 
exercise means, write “‘Do not know what it means.” 

3. Ask no questions about any of the exercises in the test. Should 
you forget and ask questions, your teacher must refuse to answer 
them. If your teacher should permit you to ask questions and then 
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answer them for you, it would defeat the purpose of the whole test 
and your answers could not be used. So be sure not to ask a single 
question, but do the best you can in answering each question as it 
is given in the test. 

4. Work as fast as you can, but do not get ina hurry. Be sure 
you know what each exercise means before you begin to answer it. 

5. At the bottom of each sheet write your name, grade, school, 
town, date, and the time the whole class begins a test and the time 
when you finish it. 


SCORING OF ANSWER-PAPERS IN PRELIMINARY TEST 


Pupils were not limited as to time. The test was given to 1696 
pupils in twelve schools in two states. The answers, 283,000 in 
number, were all graded by the authors of the scale. They worked 
out the correct answers to the exercises together, and then scored 
the papers accordingly. Where an exercise consisted of two or more 
parts, credit was given for each part ‘answered correctly. Credit was 
also given for answers that were somewhat incomplete clearly on 
account of language difficulty. Each set of questions was scored 
on the number of exercises answered correctly, number answered 
incorrectly, number not answered, and the number not answered 
because the meaning was not understood. The percent of correct 
answers to each exercise in each of the five grades for each of the 
twelve schools was determined and tabulated. (These tables are 
ready for inspection or publication, but are not included in this 
article. ) 

DERIVATION OF THE SCALES 


Scale A 


In deriving scale A the school grades were ignored. The collective 
judgment of all of the pupils—1696 of them—independent of school 
grade and school training, was taken on each exercise. On the 
basis of this collective judgment, the exercises were ranked so as to 
represent different grades of geography difficulty ranging all the way 
from the maximum amount of difficulty expressed on the scale by 
““O” to almost no difficulty expressed by ‘‘100.”” Since the collective 
judgment of 1696 individuals is probably not able to designate more 
than twenty-five grades between the minimum and maximum 
amounts of geography difficulty, the exercises were arranged into 
twenty-five groups or steps. In order to get the differences between 
these steps to represent equal differences of geography difficulty 
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or geography ability, it was necessary on the assumption that ge- 
ography ability is distributed among school children in harmony 
with the law of chance, to make the percentile values of the twenty- 
five groups of exercises equal respectively to the values of a like num- 
ber of equal divisions of the surface of the so-called normal probabil- 
ity distribution. Accordingly the base line of the normal distribu- 
tion surface was divided into twenty-five equal parts and their values 
to the nearest whole number determined in terms of the surface 
divisions. ‘These twenty-five values are the following: 0, 1, 2, 4, 
6, 8, 12, 16, 21, 27, 34, 42, 50, 58, 66, 73, 79, 84, 92, 94, 96, 98, 99, 100. 
These values were taken, then, as the values of the twenty-five 
groups of exercises, and each exercise on the basis of the rank given 
it by the collective judgment of the children was placed in its proper 
step of the scale. The exercises that did not have any of these val- 
ues were not used. Some of the exercises had only approximately 
the values of the steps into which they were respectively placed. 
To be exact, in no case does the absolute value of an exercise deviate 
from the approximate value by more than four-tenths of a step, 
and this only in a very few cases. Expressed in per cent., this means 
that 50 represents values from 46.8 to 53.2; 34 represents values 
from 31.2 to 37.2; 27 represents values from 24.6 to 29.8; and so 
on through the other values of the scale. In the construction of 
scale A, exercises have been considered as representing equal ge- 
ography difficulty if the collective judgments they received by the 
pupils in all the grades were equal, an assumption that is probably 
not true and that will not be necessary in deriving scale B. If our 
assumption be true, then all the exercises in each step of scale A 
are equal in geography difficulty. The differences between con- 
secutive steps of the scale are also approximately equal, made so 
by dividing the base line of the surface of normal distribution into 
equal divisions, as explained above. We have, then, in scale A, a 
valid means of measuring ability of children in geography. How- 
ever, the weakness of this scale lies in using for its construction the 
collective judgment of children in different school grades and there- 
fore of unequal amounts of geography training. This possible 
error is eliminated in both scales B and C. 


Scale B. 
The method used in the construction of scale B is a modification 
of the one used by Dr. E. L. Thorndike in his derivation of A Scale 
for Handwriting of Children in Grades 5-8. A brief description of 
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this method is found in Strayer and Thorndike’s Educational Ad- 
ministration, pp. 208-226. The modifications are these: first, in 
place of using “from 23 to 55 competent judges” to rank the exer- 
cises in order of geographical difficulty, we used from 18 to 21 seventh 
and eighth grade classes representing 659 different pupils; second, 
instead of depending upon the judgment of “competent judges” 
for the equality of exercises placed in the same group, as to geog- 
raphy difficulty, we- used the actual performance of school children 
under normal conditions; and, third, instead of trusting the opinion 
of “competent judges” to get the geography difficulty between suc- 
cessive groups equal, we used the twenty-five values obtained by 
dividing the base line of the normal distribution surface into twenty- 
five equal parts, as was described in connection with the derivation 
of scale A. 

In the construction of scale B the exercises were ranked into twenty- 
five groups in order of geography difficulty, the same as in scale A. 
The values of the twenty-five groups or steps are the sarne as in 
scale A and were determined by an identical process. The rank 
given each exercise by each class was determined by the average 
performance of the members of the class. The final rank given each 
exercise in the scale by all the classes combined was determined 
by their median rank. For instance, exercise 185 was placed by 
the twenty classes that judged it on the basis of average performance, 
in group 12 once, in group 13 two times, in group 14 three times, 
in group 15 eight times, in group 16 three times, in group 17 two 
times, and in group 18 once. The median rank of the twenty 
classes is group 15, and in group 15 exercise 185 was placed in the 
scale. The place of each exercise in scale B was determined in this 
way. 

It may be argued that since seventh and eighth grade classes were 
used in placing these exercises in scale B the same error is involved 
in its construction as in scale A; namely, the error due to unequal 
amounts of geography training. So far as actual performance is 
a criterion the seventh and eighth grade classes reveal, in the pre- 
liminary test, almost identical geography ability. Furthermore, 
nearly all of the schools that were tested teach very little geography 
in the eighth grade. The test was given to these schools about a 
week before they closed the year’s work. The seventh grade clas- 
ses had practically finished their course in geography and had, 
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therefore, approximately the same amount of training as the eighth 
grade classes. The little additional training the eighth grade 
classes received was offset by the more recent training in the sev- 
enth grade. Thus the fact that the pupils used in the construction 
of scale B belonged to two different school grades did little or no 
harm. But even this objection, harmless as it is, is eliminated in 
the construction of scale C. (Neither scale A nor scale B has been 
published, but a copy giving merely the number of the exercises in 
each step will be furnished upon request.) 


Scale C 


Scale C is the result of a combination of Thorndike’s method, 
modified as described in connection with scale B, and Ayres’ method 
of scale construction. It is not a single scale, but a multiple scale, 
combining scales for grades 4, 5, 6, 7, and 8. The groups or steps 
and their values are identical with those in scales A and B, and were 
determined by identical methods. To rank the exercises in order 
of geography difficulty in each single grade scale, only pupils of 
that grade were used, thus eliminating the objection of unequal 
amounts of geography training. The principle is generally accepted 
that exercises can be said to be equal in difficulty only when they 
are equal in difficulty for pupils who have had equal amounts of 
training. 

The multiple scale C was derived as follows: first, single scales 
were constructed for grades 4, 5, 6, 7, and 8; secondly, the five 
single scales were then combined to form the multiple scale C for 
the five grades. The single scales were constructed in the same 
way as scale B with the exception of using pupils of one grade only. 
In combining the five single scales the seventh grade scale was used 
as a basis for comparison. To determine the relative position of 
the single scales in combination it was thought best to find the av- 
erage deviation, in terms of scale-steps, of the ranks of the different 
exercises as they appeared in each of the single scales for grades 4, 
5, 6, and 8, from their corresponding ranks in the single scale for 
grade 7, the one used as the basis for comparison. For instance, 
in comparing the position of exercise 220 in the scale for grade 6 
with its position in the scale for grade 7, it was found that there is 
a deviation of two steps; that is, exercise 220 is placed in the former 
scale two steps nearer the “‘0”’ group than in the latter scale; ex- 
ercise 221 deviates two steps, exercise 222, three steps; exercise 
223, two steps, and so on, the total deviation of all the exercises 











SCALE FOR MEASURING ABILITY IN GEOGRAPHY 449 


being four hundred twenty-five steps, or an average deviation of 
approximately two steps. The average deviation between the 
ranks of the exercises in the scales for grades 5 and 7 was found to 
be approximately three steps; between the scales for grades 4 and 
7, approximately five steps, and between the scales for grades 8 and 
7, one-tenth of a step. The average improvement from grade to 
grade, reduced to terms of scale-steps, shows exactly the same step 
differences as does the average deviation. Using, then, the seventh- 
grade scale as the basis for combining the five scales, we placed the 
scale for the eighth grade immediately below that for the seventh 
grade and made their steps to coincide; the scale for the sixth 
grade we placed above the seventh and two steps to the right, next 
the fifth-grade scale and three steps to the right of the seventh; 
and last the fourth-grade scale and five steps to the right of the sev- 
enth. This, then, constitutes the multiple scale C, and is the one 
published under the title, A Scale for Measuring Ability of Children 
in Geography in Grades 4, 5, 6, 7, and 8.” 


CORRELATION OF SCALES A, B, AND C 


The correlation between scales B and C is .99 with an average 
variation of less than .5 of a step. This shows that the two scales 
are almost identical. Either one of these two scales is exact enough 
for measuring ability of children in geography. Scale A shows an 
average variation of approximately two steps from scales B and C, 
thus proving that the objection mentioned in its description is valid. 


REPRESENTATIVE PARTS OF SCALE C 


The following is a part of scale C, “knocked down” and re-ar- 
ranged so it can be presented in straight pages as a part of this 
article. The May, 1918, standards for the various grades are given 
in connection with each step. To illustrate: “Step S—58% (4); 
73% (5); 79% (6); 88% (7); 88% (8)” means that in a test on the 
exercises in Step S the fourth grade should average 58%; the fifth 
grade, 73%; the sixth grade, 79%; the seventh grade, 88%; and 
the eighth grade, 88%. Steps G and H are comparatively difficult 
exercises, Step M is of average difficulty and Steps R and S are com- 
paratively easy. The exercises in Roman type are the ones which 
test the memory and the ones in italics are thought-provoking. 

Step G—1% (4); 4% (5); 6% (6); 12% (7); 12% (8). 

207. Name three agencies or processes at work making rocks into soil. 


215. By what states would you pass in going by boat from Cincinnati to Mem- 
phis? 
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150. 
162. 


216. 


Step 


123. 


152. 
198. 


221. 
110. 
144. 
157. 
177. 


200. 


191. 
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Why is the rainfall of Australia limited to the eastern and southeastern parts? 

Much of India receives from 12 to 16 inches of rainfall in July and less than 
1 inch in January. Explain. 

Which is the greater distance and why, 30° west of Washington or 30° south 
of Washington? 

New Orleans is in 30° North Latitude and St. Louis is in 39° North Latitude. 
They are in the same Longitude. About how far apart are they in miles? 


H—2% (4); 6% (5); 8% (6); 16% (7); 16% (8). 


Which way does your shadow point or extend just before sundown in mid- 
summer? In mid-winter? 

How does Russia in Europe compare in size with the United States? 

Give four marked illustrations of man’s skill in overcoming natural ge- 
ographic barriers. 

Between what two bodies of land is the Bering Sea? Dover? Skager Rak? 
Babel Mandeb? 

What is the “‘ Fall Line,” and why are a number of cities located on it? 

Why ts there a heavy rainfall in the Amazon valley? 

What is the cause of rain? 

The heat equator passes through New Mexico in July and through Argentina 
in January. Give the chief reason why? 

Give the principal reason why the coastal cities of the United States have a 
more temperate climate than cities of the same latitude in the Great Central 
Plains. 

It is noon at Omaha. State the time at the following places: Baltimore, 
Denver, New Orleans, San Francisco. 


Step M—16% (4); 27% (5); 34% (6); 50% (7); 50% (8). 


90. 


93. 
111. 
133. 
136. 
155. 
159. 
212. 


214. 


199. 
109. 
135. 
138. 
140. 


165. 
188. 


117. 


How could you go to Asia if you wished to make that trip? 

Name two large rivers of Asia. 

How can you go by boat from the Hudson River to Lake Erie? 

Give two reasons for the importance of the Columbia River. 

Name two of the most important materials shipped on the Great Lakes. 
Give capitals of Japan and China. 


Name five important inland cities of Europe. 


Draw a map of your own state and locate in it two rivers, the capital, 
and the largest city. 

Name the state or territory in which each of the following is located: Gal- 
veston, Washington, St. Paul, Sitka, Savannah, Spokane. 

What disadvantage do the people of Great Britain suffer as to food supply? 

Why do the rivers of New England furnish water power for manufacturing? 

Why is the Rio Grande an important river? 

Give two reasons why cities usually grow up at waterfalls. 

Give two reasons why Argentina exports wheat to Brazil rather than to the 
United States. 

What part of Asia is similar to Canada and in what way is it similar? 

Since the larger part of our iron ore is mined in Minnesota, why ts little iron 
and steel manufactured there? 

Why does the earth not look round to us? 
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76. Why are so many hogs raised in the United States? 
Step R—50% (4); 66% (5); 73% (6); 84% (7); 84% (8). 
62. Name the five Great Lakes of North America. 
73. Whai are the two largest cities of the United States? 
29. Name two ways in which the farmer helps to get food for us. 
41. Give two ways in which water gets away when it rains. 
3. What is the name of the circle extending around the earth midway be- 
tween the poles? 
98. Give one reason why the people in the far north use reindeer and dogs in- 
stead of horses. 
37. Why does not the water flow out of swampy places? 
46. Why do not the Eskimos build houses like ours? 
6. What is under the ocean? 
19. Where does the water in a well come from? 
23. Name two ways in which winds are useful. 
Step S—58% (4); 73% (5); 79% (6); 88% (7); 88% (8). 
52. What is the largest city of your state? 
641. Where is Alaska and to whom does it belong? 
84. Name four large cities of Europe. 
92. Give the capitals of France and Germany. 
101. Name two large bodies of water that border on Florida. 
45. Name four things you use for food that do not grow where you live. 
68. Give one reason why so many of the great cities of the United States are near 
the sea coast? 
72. Which is the coldest and which the warmest part of South America? 
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THE VOCABULARY TEST AS A MEASURE OF 
INTELLIGENCE 


LEWIS M. TERMAN 


Assisted by S. C. Kohs, Mary B. Chamberlain, Mayme Anderson and Bess Henry 
Leland Stanford Fr. University 


CRITICISMS OF THE TEST 


Perhaps no mental test in current use as a measure of endowment 
meets such instant criticism from laymen and theoretical psycholo- 
gists as the vocabulary test. Scores of times the writer has been 
very positively informed that vocabulary has nothing to do with 
intelligence, and that even if the two were correlated the relation- 
ship could not be established by a test consisting of only a hundred 
words. The following are the most common objections to the test: 

1. That the number of words known depends upon accident of 
environment and instruction, not upon intellectual endowment; 
that to know the significance of a vocabulary score we should have 
to know the subject’s home, the number of years he had attended 
school, the quality of speech used by his playmates, the number 
of books read, etc. The average person would regard the use of 
the vocabulary test with a school child whose parents speak a 
foreign language as the limit of folly for mental testing, even if the 
subject had spoken English since early childhood. 

2. A second criticism of the test is that any innate ability it may 
measure is a very special ability, not general intelligence. It is 
sometimes asserted that this special ability is negatively correlated 
with intelligence. Psychologists themselves have often contrasted 
the “verbal” type of individual with the “logical” type, to the dis- 
advantage of the former. It is a common opinion that there exist 
feeble-minded persons, of the so-called “fluent” type, who have 
immense vocabularies. 

3. The critic is also sure to insist that even if the vocabulary test 
were valid in principle, no dependence could be placed on the score 
obtained from a 100 word list selected at random from a dictionary 
containing 18000 words. 

4. Finally, the test is believed by some to be largely invalidated 
by the personal equation in scoring. 

In view of such criticisms we have taken the trouble to assemble 
here a few facts bearing on the validity of the test in question. What 
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has been done is only a beginning, and it is to be hoped that some- 
one will before long make a thorough quantitative and qualitative 
study of vocabularies. 


CORRELATIONS WITH MENTAL AGE 


The best way to measure the reliability of an individual test is 
to correlate it with the total score from a group of standardized 
tests the reliability of which is known. We have used the Stan-. 




































































TABLE I 

Vocabulary Score 
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 8085 Total 
19 1] 1] 2) 1 5 
188 2) 1) 1 9 
18 1 1} 1} 2} 3] 2) 1 12 
178 1} 2} 6| 3] 3] 2] 1 18 
17 1} 2} 2} 9} 1) 1 16 
168 2| 6| 3] 5| 2| 2 20 
16 6| 8/13/10] 3] 2] 1 43 
158 4| 8| 9] 8! 5} 11 3 38 
15 1 2} 116/11] 9} 4] 2 45 
148 1} 4/11/11] 5| 4 1 37 
14 3| 6/11] 6] 1] 1 29 
a 13° 1 2| 5| 6| 7] 2) 1/1 25 
<= 13 3] 2] 7] 21 6| 5 1 26 
3 2 1} 2} 2] 6| 4| 4| 3 22 
= 12 1} 1] 3} 3/10] 4/ 1/ 2 25 
4 11° 3| 5| 6| 2) 1 1 18 
ail 1} 1/11/10} 2] 1] 1 27 
108 1] 1] 6] 4] 11 3 1 17 
10 2| 2| 6/14) 5] 1 30 
gf 1} 1] 1] 5} 4}11} 1 24 
9 1} 3/11] 3] 7] 2) 1 28 
ge 2\10| 6| 5 23 
8 1] 6] 4] 9] 2 22 
7 1} 6] 4| 5 1} 1 18 
7 4|13] 9} 2 28 
6° 5/11] 2 18 
6 1] 5 6 
58 2 2 
Total |13]47|36|43/31|58|34|48133|77/66|45/53/16/18| 8} 4| 631 

r 


=.91 


Note. Mental age 9-6 equals 9-6 to 9-11, inclusive, etc.; vocabulary score 10 
equals 10 to 14, inclusive, etc. 
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ford. Revision of the Binet-Simon Scale: for this purpose. It ‘is 
not claimed that this scale is a perfect measure, but the degree of 
its reliability is fairly well established. Mr. Otis has’devised a 
method* for calculating the probable error of a Stanford-Binet 
mental age score. Application of this method in a large number 
of cases shows the probable error to be less than 6 months in terms 
of mental age when the test is used with miscellaneous adults, and 
about 3 months when used with children of the first school grade. 

When the scale is split endwise and only three tests are used in each 
year, the probable error is still only about 714 months. Any single 
test which correlates closely with such a test series may be regard- 
ed as having a high degree of reliability. 

Table I shows the correlation between vocabulary score and 
mental age earned on the Stanford Revision. The 631 subjects 
were school pupils scattered from grade I to the first year of high 
school. The intelligence quotients ranged from less than 50 to 
more than 150. All but a few of the children were from homes 
where English is spoken. 

Very few tests or groups of tests yield as high correlations as that 
found in the above table (r=.91), but it is the usual thing for the 
vocabulary test. It is evident that a mental age based on vocabu- 
lary score alone would not be far wrong in a large per cent. of cases. 
We have determined the probable error of such a mental age, which 
we may call the “vocabulary mental age,” and found it to be ap- 
proximately 914 months. The following table shows the probable 


error of a “vocabulary mental age’”’ for various vocabulary score 
ranges: 





*Described in The Psychological Clinic. 1918. 
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From the above we can infer that when we are dealing with school 
children mental age based on the vocabulary test alone will not de- 
viate from a mental age earned by the entire Stanford Revision 


914 months in 50 per cent. of cases. 


more than 
12 sé 
18 i 
24 - 
36 46 


Table III shows the correlation of vocabulary with mental age 
in the case of 482 miscellaneous adults, including: 

150 “‘Hoboes” tested by Mr. H. E. Knollin 

150 Prisoners tested by Mr. H. E. Knollin 

150 delinquent youths tested by Dr. J. Harold Williams 


32 business men 
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TABLE III 
Vocabulary Score 


15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 _~=St'- Total 











19 1} 1} 1} 11 5 
18° 2| 4| 3] 2 ll 
18 1} 5} 3} 41 3 16 
17° 1 1} 2} 2] 3} 1 10 
17 1} 3} 1] 6] 3] 2] 4) 1/1 22 
168 1} 1) 5|°3] 2 2 14 
16 2} 2] 3] 5] 21 5] 4] 1] 3 27 
15° 1} 1} 2} 5] 4] 5] 3] 5] 4] 1] 1 32 
15 3} 4] 3} 61 5] 1] 1 2 25 

Me 4| 5! 6/11] 2] 3] 2 1 34 
g 14 3] 8] 8| 7| 2) 2) 1] 2) 1 — 34 
13° 7| 8| 7| 5] 3} 2| 2 34 
i 3] 5| 3} 7/12] 5] 3] 2] 2 42 
3 12 2} 1| 8} 7| 8} 2) 1) 1 30 
s 12 2} 6| 6| 8} 5] 1) 1) 1 30 
11° 1} 3] 7| 5) 1] 7} 1 25 
11 1} 1] 6] 8] 6| 5}. 1 28 
10° 1 1} 5] 4] 4] 3 19 
10 l 2} 9| 4| 2 18 
98 2] 1 2 5 
9 1] 3} 3] 2/1 10 
ge 2} 1] 2 l 6 
8 1} 1 2 
7 l 1 
7 lj 1 'g 2 
Total 5] 6|15|34/37|45/59|61| 42] 40] 28) 28|29|18|24| 9] 2 482 



























































r= .81 


The correlation in the above table is..81 Pearson. This is high, 
but not quite as high as that found for children. The difference 
is readily accounted for by the extraordinarily motley character of 
the group which contained individuals of many races, all degrees 
of education and quite a number who had spoken another language 
before learning English. For this group the P. E. of a mental age 
based on vocabulary approximates 12 months. A mental age thus 
secured would not, even for such subjects as these, deviate from a 
Stanford-Binet mental age more than: 

12 months in 50 per cent. of cases. 
18 “é 6é 31 “< 6é sé 6é 
2 4 66 6é 1 8 éé 6é 6é 6é 
36 «é “é 4.3 &é 6é “é “é 
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INTELLIGENCE QUOTIENT AND VOCABULARY 


It has been shown that vocabulary correlates highly with mental 
age, or the absolute mental level. We have also raised the ques- 
tion whether it bears any constant relation to intelligence quotient, 
which is an index of relative brightness. In order to answer this 
question we have divided the 640 school children into three groups, 
including respectively those with I. Q. below 86, 86 to 114, and above 
114. The median vocabulary of each of these three groups was 
as follows for the various mental ages: 





TABLE IV 

Mental Age Below 86 86-114 Above 114 
17-6 to 18-5 74 
16-6 to 17-5 66.9 65 
15-6 to 16-5 61.9 56.9 
14-6 to 15-5 57.0 55 
13-6 to 14-5 51.7 50 
12-6 to 13-5 46.7 47.5 43.5 
11-6 to 12-5 43.7 41.4 40.2 
10-6 to 11-5 34 33.7 36.2 
9-6 to 10-5 32.5 29 30.6 
86 to 9-5 24 -22.2 21.7 
7-6 to 8-5 18.3 18 20 
6-6 to 7-5 13.7 12.3 12.5 
No. in Group 112 150 185 


It will be seen that at no mental age do the three groups differ 
considerably in median vocabulary. The only constant tendency 
noticeable is for the group of highest I. Q. to fall slightly below the 
middle group after the mental age of 12 years. We have found 
this tendency quite marked in children above 140 I. Q. 

The last mentioned fact deserves emphasis. We have often 
been told that our subjects who test so unusually high, say 140 
or above, are probably less bright than they seem; that they belong 
to the ‘‘verbal” type and test high on the Stanford-Binet because 
it is so largely a language scale. The reverse of this is actually 
the case. Such children do less well on the language tests of the 
scale than they do on tests which make heavier demands upon reas- 
oning. Mr. Kohs has found that children of exceptionally high 
I. Q. by the Stanford-Binet, consistently earn a still higher I. Q. 
by his Block Design scale, a scale which is made up entirely of per- 
formance tests. 
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It is also worthy of remark that although the children of a given 
mental age who are below 86 I.-Q. are from one to several years 
older than those of the median and bright groups, this advantage 
has practically no effect on the vocabulary score. The latter de- 
pends upon mental level and is but little influenced by chronologi- 
cal age. 

If age apart from intelligence affected the vocabulary very much 
the fact would certainly appear in the median vocabulary scores 
of Knollin’s 482 miscellaneous adults. These were mostly from 
five to forty years older than the “mental age” earned in the Stan- 
ford-Binet test, yet as is shown by the figures in Table III their 
vocabulary scores by mental age were only slightly above those of 
school children of the same mental age, except at the highest levels, 
where the discrepancy is somewhat more marked. 


TABLE V 
Median vocabulary scores of children and adults by mental age 
oF ca iS ae ee, OE ee, ee a 
ee 8 Be Bs Ot Se te eR 2 


Note. Mental age 7 =6-6 to 7-5. etc. 


The correlations with mental age were computed separately for 
the three groups, bright, average and dull. They were as follows: 


Group Correlation (Pearson) 
NS lt LS SO: 95 
RS ad a ei a at Ie 944 
IS a ee ag 86 


EFFECT OF FOREIGN LANGUAGE IN THE HOME 


In the last two or three years some three hundred Portuguese 
and Italian children have been tested by various Stanford students. 
Most of these children were from homes in which Portuguese or 
Italian is spoken, usually also English. Although in such cases 
we do not make use of the vocabulary test in reckoning mental age, 
it had been given in 132 cases. The median vocabulary score for 
these children at each mental age is given in Table VI. For com- 
parison the median scores for the three groups of American chil- 
dren for the same mental ages are repeated. 
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. . TABLE VI 

Below Above 

86 IQ 86-115 115 1Q Latins 
13-6 to 14-5 51.7 50 50 
12-6 to 13-5 46.7 47.5 43.5 42.5 
11-6 to 12-5 43.7 41.4 40.2 33.3 
10-6 to 11-5 34 33.7 35.2 30 
9-6 to 10-5 32.5 29 30.6 25 
8-6 to 9-5 24 22.2 21.7 16.4 
7-6 to 8-5 18.3 18 20 12.3 
6-6 to 7-5 13.7 12.3 12.5 9 


The fact that a majority of these children had learned another 
language before learning English is reflected in their inferior vocabu- 
lary scores for three or four years after entering school. After 
that, however, the vocabulary rapidly catches up with mental age. 
After the mental age of 12 years these children are practically on 
a par with.their fellow pupils of the same mental level who have 
known no other language than English. 

For the entire group of 132 Latins with mental ages ranging 
from 6 years to the “average adult’’ level, the correlation of vocabu- 
lary with mental age score was .84, or practically the same as for 
the American group with I. Q. below .86. 


SEX DIFFERENCES : 

Of the 631 children, 359 were boys and 272 were girls. As the 
group was a miscellaneous one no comparison of sexes could be 
made on the basis of chronological age, but the median vocabulary 
scores of boys and girls at each mental age were as follows: 


Mental Age Boys Girls 
18-6 to 19-5 75 76.2 
17-6 to 18-5 : 73 70 
16-6 to 17-5 65 68 
1%6 to 16-5 61.6 61.8 
14-6 to 15-5 51.7 56.5 
13-6 to 14-5 50.5 51 
12-6 to 13-5 45 49 
11-6 to 12-5 42.5 40.5 
10-6 to 11-5 35.7 32.5 
9-6 to 10-5 31 31 
8-6 to 95 24 21.8 
7-6 to 8&5 20. 17 


7 
6-6 to 7-5 13.6 12.5 
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The differences are so slight that they have little significance. 
The small advantages of first one then the other sex are probably 
due to the limited number at each mental age. On the whole, the 
boys are a shade superior up to 12 years, the girls thereafter. The 
increase with each sex, apart from irregularities, is fairly constant. 
Figure I shows that for the sexes taken together the curve of vocabu- 
lary growth by mental age is practically a straight line. 
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FIGURE 1. THE VOCABULARY SCORE AND MENTAL AGE 

v RELIABILITY OF VOCABULARY SAMFLING 


One way to determine the reliability of a vocabulary test is to 
correlate the scores by the same individuals when tested by two or 
more different tests, each test constructed by the same method of 
random sampling from a dictionary. Five such lists were construct- 
ed by selecting every hundredth word from the Laird and Lee Vest 
Pocket Dictionary, the dictionary on which the Stanford vocabulary 
was based. The first list began with the first word in the dictionary, 
the second with the tenth word, the third with the twentieth, etc. 
As the dictionary contained approximately 18000 words, the selec- 
tion gave 180 words for each list. All five tests were then given 
to 65 Stanford University students attending a class in education. 
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Most of the students were of the junior and senior classes. Mimeo- 
graphed sheets were distributed containing the 180 words of one 
test, and the definitions were written. A period of 50 minutes was 
used for each test, one test being given each week. The laborious 
task of grading the written definitions was undertaken by Miss 
Mayme Anderson, a senior student in the department of education. 
The grading was done with extreme care. The names of the stu- 
dents were clipped from the papers so that the scores would not be 
influenced by any unconscious bias of the grader due to acquain- 
tance with members of the class. 
The intercorrelations of the tests were as follows: 


Test 1 2 3 4 5 Corr. with av. of all 
1 .83 .73 .78 .76 .92 
2 .83 .80 .80 .78 .95 
3 .73 .80 .80 .72 .90 
4 .78 .80 .80 .76 .92 
5 .76 .78 .72 .76 .84 
Corr. with av. ofall .92 .95 .90 .92 84 


Table VII, showing the correlation between test 3 and the average 
of all five, is typical. 


TABLE VII 
























































Score in Test 3 
119-(122-)125-|128-/131- 134-1137- 140-/143-|146-/149-(152-)155-|158-(161- 

121 |124 |127 |130 |133 |136 |139 |142 |145 |148 |151 |154 |157 |160 |163 | Total 

158-160 1 1 

2 155-157 1 1 

& 152-154 3} 1} 1 5 

> 149-151 2 2 

> 146-148 3 4 

S 143-145 i 21 ess 1 7 

£ 140-142 1} 4] 4] 3] 3 1 16 

» 137-139 tae ifcan 7 

8 134-136 1 3 4 

131-133 ra 83s 4 

2 128-130 .f ar 8 6 

®@ 125-127 ne ae 2 

S 122-124 1 1 
2 119-121 
116-118 

113-115 2 

Total | 2} 2] 2] 5/1 31 3113) 61 71 9] 4] 31 2 1 62 

r=.90 


The average of the correlations of five separate lists with the 
average score in all is .906. We can conclude, therefore, that a 
single list of 180 words is not greatly inferior in reliability to one of 
900 words (that is, five of 180 words each.) 
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A better method of finding the reliability of a single test as com- 
pared with a composite of five is as follows: 

(1) for a given test find each student’s deviation from his average 
score in the five tests; 

(2) find the probable error of these deviations. 

These probable errors are as follows for the five tests: 


Test 1, 1.6 words 
Test 2, ao" 
Test 3, ne 
Test 4, ts ilies 
Test 5, io 





Average P.E. 1.72 words 


Since each word in the vocabulary list represents 100 words in 
the dictionary, the probable error of 1.72 in the score becomes 172 
words in total vocabulary (as based on the dictionary used). In 
other words, the chances are even that a total vocabulary based on 
the 180 word list will not deviate more than 172 words from a total 
vocabulary ‘based on a 900 word list. The chances are 6 to 1 that 
it will not deviate more than 354 words (2 P. E.); 25 to 1 that it 
will not deviate more than 516 words (3 P. E.); and 140 to 1 that 
it will not deviate more than 688 (4 P. E.). 

For these 65 students the correlation was found between the av- 
erage score on the five tests and the average class mark earned in 
all the university courses. As this was not done until two years 
after the test was given, the class marks were available for the four 
years of university work. The correlation was .28 Pearson. 


RELIABILITY OF VOCABULARY LISTS WITH SCHOOL CHILDREN 


Two of the 180 word vocabulary lists used with college students 
were given by Mrs. Chamberlain to 32 school children who had 
been tested by the Stanford-Binet. The pupils were fairly evenly 
distributed from the first to the eighth grade. In this case the tests 
were given orally to the pupils taken one at a time as in a Binet 
test. 


Each of these 180 word lists we have divided into three by select- 
ing every third word. Thus we have six lists of 60 words each. 


These may be treated as six separate tests or they may be combined 
into tests of 120, 180, 240, 300 or 360 words. Table VIII gives the 
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intercorrelations of these six tests for the 32 children. The correla- 
tions were found by the Spearman Footrule and converted into 
values by the formula recommended for this purpose by Pearson. 


TABLE VIII 

a b c d e f abc def 
a .94 .93 94 .94 .94 98 .96 
.94 95 94 .95 .92 98 .98 
Cc .93 .95 94 .94 .93 97 .96 
d .94 .94 .94 .97 .95 96 .99 
e .94 .95 .94 .97 .92 96 .94 
f .94 .92 .93 .95 .92 94 .97 
abc .98 ; a .96 .90 .94 .98 
def .96 .98 .96 99 .94 .97 .98 


The correlations in the above table are strikingly uniform and ex- 
tremely high, decidedly higher than those found for the 180 word 
lists used with university students. This is not to be interpreted 
as indicating a higher reliability for a 60-word than for a 180-word 
list. The size of a correlation coefficient is greatly influenced by 
the heterogeneity of the subjects. In this respect the group of 
school children differed greatly from the university group. The 
latter represented a cross section of mental ability; the former 
ranged from six year mental age to “superior adult” by the Stan- 
ford-Binet scale. 

It is a better measure of reliability of a 60-word list to find the 
P. E. of its deviation from the average of the six tests, as we have 
already done for the university students. Following are the P. E. 
values found for the deviations for each of the 60-word lists: 


P. E. of Deviations 
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Average P. E. 1.48 words 
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Since each word in a 60-word test represents 300 words in the dic- 
tionary, the P. E. of 1.48 words in the score equals a P. E. of 444 
words in total vocabulary. The chances are even that a total vo- 
cabulary (e.g., 7,200, 13,400, etc.) based on a 60-word test will not 
differ by more than 444 words from that which would result from a 
360-word list. A P. E. of this amount is approximately 5 per cent. 
of the total vocabulary of an average 12 year child. 

In like manner the average P. E. was computed for the three 120 
word lists and found to be 2.13 words in score. Since each word of 
a 120-word list represents 150 words in the dictionary, the P. E. of 
2.13 equals 320 words in the total vocabulary. The P. E. of a list 
of 180 words is 1.9 in terms of score, or 190 words in total vo- 
cabulary (since each word in this case represents 100 words in the 
dictionary). 


THE SELECTION OF CRUCIAL WORDS FOR VOCABULARY TEST 


This will be made the subject of a separate report later. Cor- 
relation of each word of the Stanford vocabulary test with mental 
age shows that the words differ greatly in diagnostic value. Some 
present a very steep curve of per cent. passing at successive mental 
ages, others give a curve which is almost horizontal. The latter 
should of course give place to other words chosen for their diagnostic 
value. We believe it will be possible, before long, to measure the 
intelligence level almost as accurately by means of a vocabulary list 
of 100 crucial words as it can now be measured by any existent 
intelligence scale. 


VOCABULARY AND THE TEST OF NAMING WORDS 


In order to answer the question whether the ability to name a 
given number of words in three minutes is dependent upon the 
number of words known, we have correlated for a miscellaneous 
group of 360 children the scores earned on the Stanford Revision 
vocabulary test aad the Binet test of naming words (three minutes). 
The correlation was .49, Pearson. The dependence i is only moder- 
ately ec | 

ERRORS OF SCORING 


The writer has corrected the definitions, recorded verbatim, of 
100 vocabulary tests made by three Stanford students who were 
being trained in Binet testing. The vocabulary scores in these 
tests ranged fairly evenly from 15 to 65, with an average of ap- 
proximately 30. In only about 2 per cent. of the total scores given 
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the individual definitions had to be changed, and only in an insig- 
nificant number of cases did the errors in a test have any effect on 
the mental age earned. The criticism that the test is vitiated to 
any serious extent by difficulty of scoring certainly does not apply 
in the case of ordinarily competent examiners.* 

However, there are certain words in the Stanford vocabulary 
list which bring out the personal equation in scoring more often than 
the others. Among them are ramble, afloat, artless, outward, south- 
ern, noticeable, quake, nerve, sportive, peculiarity and selectman. 
These should ultimately be replaced by others equally difficult 
but less troublesome to score. 


SUMMARY Bart- 
( 1. For a miscellaneous group of 631 school children the correla- 
tion between vocabulary and mental @ge is .91. 

2. The probable error of a mental age based upon the vocabulary 
test alone is only 9.6 months, in the case of school children, and the 
chances are approximately 6 to 1 that such a mental age will not be 
in error more than a year and a half. The probable error is only 
12 months in the case of prisoners, hoboes, and other adults of 
widely different age, experience and schooling. 
| 3. Children of a given mental age have approximately the same 
- vocabulary regardless of chronological age. 

4. Portuguese and Italian children from homes where a foreign 
language is spoken, are for the first two or three years of school life 
considerably below the median for American children of the same 
mental age. This difference, however, almost totally disappears 
by the time the child has attained the mental age of 12 years. 

5. The median vocabulary at each mental age is practically 
the same for boys and girls. 

6. Vocabulary growth is remarkably constant and regular, 
the curve of medians for the successive mental ages being almost 
a straight line. 





*The following is a striking illustration both of the ease of scoring the vocabulary 
test and of its accuracy. A feeble-minded youth was brought to the writer for a 
mental examination. When the vocabulary test was being given the subject in- 
terrupted with the statement that he had read about “that word test” in the Lit- 
erary Digest and that he had tested himself. Asked how many words he knew, he 
said 36. When the Binet test was finished it was found that the boy had defined 34 
words correctly, which is exactly the median for 11 years. The mental age by the 
complete Stanford-Binet was also exactly 11 years. This feeble-minded boy had 
measured his own intelligence and missed it by only a third of a year! 
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7. When five different vocabulary tests of 180 words each were 
given to 65 university students the intercorrelations ranged from 
.72 to .83, with an average of .77. The average correlation of a 
single test with the total score for five tests was .906. | 

8. The probable error of the deviations of a single test from aver- 
age score earned in five tests is only 1.72 words in the score, or 172 
words in total vocabulary. 

9. The correlation of the average score earned in the five tests 
with average university class work was .28. 

10. Six vocabulary tests of 60 words each yielded in the case of 
32 school children, intercorrelations of .92 to .98, with an averag 
of .94. ! 

11. The probable error of the deviations of a 60-word list from the 
average score earned in six such tests is only 1.48 words, or 444 
words in terms of total vocabulary. The probable error of total 
vocabulary is only 320 words when the average of two such lists 
is taken, and only 190 for the average of three. 

12. The correlation of vocabulary score (in 100 word list) with 
number of words named in three minutes (60 word test) was .49. 

13. The errors in scoring the vocabulary test are negligible for 
examiners who have had adequate training. 








COMMUNICATIONS AND DISCUSSIONS 


A PRELIMINARY NOTE ON THE USE OF THE HAHN- 
LACKEY GEOGRAPHY SCALE 


During the past few months the writer, with the assistance of 
some of his graduate students, has been able to make some studies 
of the results of the teaching of geography by means of the Hahn- 
Lackey scale. - In one Brooklyn (N. Y.) school where 37 per cent. of 
the children are foreign born, various steps of the scale were given to 
460 children in grades 4 to 8, inclusive. The average of all the 
tests given was 65.2 per cent. According to the scale the average 
should have been 74 per cent. The average of the results in the 
eighth grade in the school was within one per cent. of what the 
scale called for, using steps “S,”’ “Q,” “M,” and “P” as a basis. 

A second set of papers, based on the very simple steps ““W”’ and 
““X” of the scale, has been collected in another large Brooklyn 
school with 3600 children. Of these children 35 per cent. are He- 
brews and 10 per cent. Italians. The total number of children 
tested, grades 4 to 8 inclusive, was 1595. The grand average for 
all the children’s papers was 74.06 per cent. According to the 
scale it should have been 94.2 per cent. Here, as in the first school 
cited, the children in the 8th grade made a better relative showing 
than those in the lower grades. They had an average of 90.1 per 
cent. where the scale called for 98.5. They were thus 12 per cent. 
more efficient in their answering of geography questions that they 
were supposed to know than were the 1595 children taken as a whole. 

Comparing these two schools on the basis of steps in the scale 
which were entirely different, we find that the first school was 88 
per cent. efficient, and the second 78.7 per cent. efficient in their 
results in geography. 

A third set of papers based on steps “W”’ and “X”’ was collect- 
ed in a summer school for children in a college town. 116 papers 
were rated in grades 4 to 7 inclusive. The average for all the chil- 
dren was 65.7 per cent. (Cf. 74.06 per cent., which was the average 
made by the second Brooklyn school on the same material.) The 
116 children should have made 93 per cent. Expressed in percent- 
age, this school is only 70 per cent. efficient in geography. 

' The “efficiency” rating for the three schools would ‘be then, 
= “4 the first, 78.7 for the second and 70 per cent. for the small 

00 


Further investigations will be undertaken, and a full report 
published. 








CHESTER A. MATHEWSON. 
Brooklyn Training School for Teachers. 


THREE METHODS OF TEACHING RADIO TELEGRAPHY 


Last fall the Carnegie Institute of Technology started a night 
school course in radio telegraphy for drafted men of Class 1 in 
Pittsburgh. This offered an unusual opportunity for — 











SPAS 
es re 


thee 


=e 


Sai 
A Bee a et 
2 a : 


= awl 





7 


> oI a 
= A x OF 0 
PA ae Lee 





& 
ag tA eh me - meine emteaigll a 
os, tno ee 


ae 
aes a ee 
Pek | «& 


468 THE JOURNAL OF EDUCATIONAL PSYCHOLOGY 


mental work on the learning process of which we have availed our- 
selves. A two minute test in receiving the continental code with 
prescribed test material was given at every meeting of the class. 
Classes met three times a week. I shall report here one phase of 
this investigation, namely, the comparison between several meth- 
ods of teaching the telegraphic code. 

The classes in radio telegraphy which were organized November 
19th and during the two following weeks were divided as nearly 
as possible into six groups of equal general ability on the basis of 
a series of mental tests. Two classes were taught by the visual 
— two by the phonetic method, and two by the synthetic 
me ‘ 

The Visual Method. The students were given cards on which 
were printed the dot-dash equivalents of the letters of the alpha- 
bet according to the continental telegraphic code. They were 
asked to write these dot-dash equivalents until thoroughly memor- 
ized. In class the students were given blackboard explanations of 
the dot-dash equivalents, but their practice in class was entirely 
devoted to receiving and interpreting the auditory impressions from 
a buzzer. The objection raised against this method is that the 
student thinks of dots and dashes instead of sounds and rhythm. 
The report reached us that a school of telegraphy in St. Louis em- 
phasizes the sound of the letters, excluding all drill on the visual 
dots and dashes and that the learning time by this procedure is 
considerably lower than that of the visual method. We have tried 
this procedure under the name of the phonetic method. 

The Phonetic Method. The students were asked to avoid learn- 
ing the telegraphic code by any visual equivalents, and to con- 
centrate their efforts on the sound patterns of the letters: We. 
have good reason to believe that these instructions were quite 
generally complied with. Their entire practice-time was devoted 
to drill on the interpretation of the auditory sound patterns. They 
were probably not confused by thinking simultaneously about the 
visual and auditory equivalents of the alphabet. One serious ob- 
jection against this method is that the student is deprived of the 
ready opportunity of studying the telegraphic code outside of class 
hours unless he can obtain practice by hearing others send. Prac- 
tice by the visual method requires ony one man, while practice by 
the phonetic method requires, besides the student listener, a good 
sender. On this account the students who faithfully follow the 
phonetic method do not practice as many hours a week as the 
visual students. In spite of this obvious disadvantage the phonetic 
method was tried, since that is the only way to settle the question. 
The generally approved: method of learning typewriting is the 
touch system by which the student is forbidden to look at the type- 
writer keyboard but this method is not the “obvious”’ one to the 
beginner nor to common sense. 

During a discussion of these teaching methods in the seminar of 
the Department of Psychology at Carnegie Institute of Technology 
it was suggested that the analogy of the new methods of teaching 
reading in the public schools might be made use of. These methods 
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teach the student to differentiate between words and do not re- 
quire him to analyze the individual letters. We decided to try this 
procedure and we have designated it the synthetic method. 

The Synthetic Method. The student is taught to differentiate 
between several short words. The list of words is increased with 
practice and the student is expected to acquire the ability to recog- 
nize the individual letters not as individuals but as parts of words. 
This procedure produces remarkable results in teaching reading and 
it had not been tried in learning telegraphy as far as we could as- 
certain. One objection to the synthetic method is that the mortal- 
ity in the night classes which were taught by the synthetic method 
was extraordinarily high. This was due to the discouragingly slow 
progress during the first few weeks which is to be expected with the 
synthetic method. Even if the method should be superior in the 
end this factor of initial discouragement must be reckoned with. 
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The Results of the Teaching Experiments. The accompanying 
diagram shows the progress of the three groups of radio students 
taught by the three methods. The curves are plotted with re- 
ceiving speed in terms of words per minute as ordinates, and amount 
of practice in terms of class hours. The ordinates represent the 
average receiving speed of 38 ‘students in the visual method, 51 
students in the phonetic method, and 24 students in the synthetic 
method. These figures indicate that the mortality in the syn- 
thetic classes is greater than in the visual and phonetic classes. 
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The visual method is evidently superior to the phonetic and 
synthetic methods. I believe that the primary reason for this is 
that the visual method gives opportunity to practice the code out- 
side of class hours without requiring the assistance of a second 
person. The curves shown here are not theoretically correct since 
the practice units of the curves are not comparable, but.the curves 
do represent the practical state of affairs namely, (1) that the visual 
method affords opportunity for more practice hours per week than 
the phonetic and synthetic methods, and (2) that the advantage 
so gained by the visual method outweighs the advantages of the 
synthetic and phonetic methods. These conclusions do not nec- 
essarily hold for a full-time day course in telegraphy. 

Recommendations. On the basis of observation of progress in 
these classes, we recommended the visual method for initiating the 
student to the telegraphic code. Practice outside of class hours 
may be by the visual method and is voluntary on the part of the 
student. Practice in class is limited mainly to drill in recognizing 
the sound patterns. As soon as the code has been mastered, word 
drill is introduced to help the student in recognizing words as such 
rather than by individual letters. The addition of a series of two 
hundred sample telegrams furnished us through the courtesy of 
Col. J. B. Allison, of the Signal Corps, has been a vital factor in keep- 
ing the interest of night students in the rather monotonous code 
practice. L. L. THURSTONE. 
Division of Applied Psychology, 

Carnegie Institute of Technology, Pittsburgh, Pa. 
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EDITORIAL 


In an address at Atlantic City last spring Professor Judd said 
that one reason why we do not hear so much about educational 
measurements now as we did a year or two ago 

EXPERIMENTAL js that they are becoming a part of the routine 
STUDIES IN work of the school. Some may deny the under- 
EDUCATION lying assumption, and insist that we now hear 
much more about them than ever before,—that 

there never has been a time when there was such activity in the 
construction, perfection and a be oe sae of educational scales as 
at present. The contest of educational measurements for recog- 
nition has pasek. No progressive, alert, superintendent, principal 
or teacher denies their theoretical significance and possible value. 
But whether they have generally become a part of the routine work 
of the school may well be questioned. Would that it were so. 
While most teachers admit that educational measurements are good 
things (when applied in some other school), and while many have 
a vague feeling that the experimental movement is full of promise, 
it is precisely the systematic use of measurements in the routine 
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work of the school that is conspicuously lacking. All too generally 
they are looked upon by the teacher as a new sort of club held over 
her by some outside power (the superintendent or a survey com- 
mission) to drive her to more strenuous exertions. Rarely has the 
teacher come to see that they can be used as aids to enable her to 
pr or ag her purposes with more certainty and less expenditure 
ol effort. 

To make educational measurements better known to teachers and 
school officers, to foster the experimental attitude in education, 
and to devise ways and means of employing educational scales 
in school routine, are the aims of a new local organization in New 
York City known as the New York Society for the Experimental 
Study of Education. The society meets one evening a month dur- 
ing the school year for reports on experimental investigations in 
progress, discussions and plans for new investigations, and abstracts 
of experimental studies carried on elsewhere. The membership 
includes superintendents, high and elementary school principals 
and teachers, officials of bureaus of educational research and educa- 
tional foundations, and representatives from teacher training 
schools, colleges and universities. 

It is to be hoped that the society will prove a rallying point for 
all those interested in the scientific study of education in that 
locality, and will contribute materially to a wider appreciation of 
the value of such study to the school administrator, to the teacher 
and to the pupil. Particularly in view of the social and economic 
reconstruction which impends after the war there is need for edu- 
cational authorities to work out methods to determine quickly and 
with a fair degree of accuracy the individual peculiarities and abil- 
ities of each child, to decide upon the kind of educational activities 
that will best fit him for his probable life work, to ensure adequate . 
opportunity for the development of those broader interests which 
are necessary to good citizenship and a cultured personality, and 
to individualize instruction so that he will always be confronted 
with tasks adjusted to his powers and attainments. These ends 
can only be realized as the result of painstaking and carefully con- 
trolled studies carried on co-operatively by school people under 
school conditions. The society has the hearty approval of Dr. 
W. L. Ettinger, the recently elected superintendent of schools, and 
gives promise of great influence and usefulness. J. 














NOTES AND NEWS 


The latest price list and circular of information from the Bureau 
of Educational Measurements and Standards of the Kansas State 
Normal School at Emporia contains answers to twenty questions 
frequently asked concerning educational measurements, a_ brief 
description of each of the more important standardized tests, and a 


bibliography of the more significant articles in the domain of each 
school subject. 


The Indiana University Bureau of Cooperative Research, under 
the new director, Dr. Walter S. Monroe, announces a series of 
standardized reasoning tests in arithmetic. The present arrange- 
ment of the tests is based on returns from over 13,000 pupils col- 
lected last spring. The tests are arranged in three sets, one for 
grades IV and V, one for grades VI and VII, and one for grade 
VIII. A final standardization of the tests is planned for the cur- 
rent year. 


A committee on standard tests and measurements of the Con- 
necticut School Superintendents’ Association, consisting of Ernest 
C. Witham, Southington, and Carlon E. Wheeler, New London, 
has recently published an interesting report. A schedule is pre- 
sented providing for one or more standard tests in each month in 
the school year. In this way the burden of testing is distributed, 
and the administration of tests is made a routine matter. The 


report contains a brief discussion of the most valuable tests in each 
school subject. 


The Municipal Reference Library, City of New York, Dorsey W. 
Hyde, Jr., librarian, has recently inaugurated a series of special 
reports on civic subjects. The first number contains a list of refer- 
ences on ““‘What to Read on New York City Government,” and the 
second presents an interesting discussion of ‘‘ Teaching Citizenship 
via the Movies.” 


The Child Health Organization of New York City is making an 
effort to direct the attention of teachers to the physical welfare of 
their pupils. It has adopted the slogan of “‘Scales in every school,”’ 
and an effort is being made to get teachers to keep a chart of each 
pupil’s height and weight each week, and thus keep a careful watch 
over the pupils’ nutrition. 


The Public Charities Association of Pennsylvania is endeavoring 
to facilitate the formation of special classes in the public schools 
of every community in the state for the special instruction of men- 
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tally exceptional children. It has prepared a bill for the next 
legislature providing state aid for school districts which organize 
such classes. It is gathering authentic data as to the precise extent 
of mental retardation and deficiency among children of school age, 
and through mental clinics and examining stations it is trying to 
aid in the identification and classification of backward children. 
This work is largely under the supervision of Dr. Norbert J. Mel- 
ville, of the Philadelphia School of Pedagogy. 


Dr. Harry Kirke Wolfe, professor of philosophy in the University 
of Nebraska, died on July 30 at the age of fifty-nine years. Dr. 
Wolfe was one of the pioneers in experimental psychology and its 
applications to education in America. 


Dr. Eleanor H. Rowland, formerly dean of women at Reed Col- 
lege, has recently become head aide at the Walter Reed Hospital, 
in Washington, D. C., where the first returned American soldiers 
have been received. 


Professor C. E. Seashore, of the University of Iowa, is conducting 
investigations on certain problems of hearing as related to the 
army and navy, and is also devising and standardizing a series of 
tests for the selection of telegraphers and radio operators. R. H. 
Sylvester, assistant professor of psychology in the same university, 
is now lieutenant and chief clinical psychologist at Camp Dodge. 


Dr. H. L. Hollingworth, associate professor of psychology in Bar- 
nard College, has been commissioned captain in the sanitary corps. 


Dr. Samuel C. Kohs has been elected assistant professor of psy- 
chology at Reed College, Portland, Oregon. 


Dr. Cyrus D. Mead, who for six years has been assistant pro- 
fessor of elementary education at the University of Cincinnati, 
has been appointed associate professor in the University of Cal- 
ifornia. Before leaving Cincinnati Dr. Mead was elected presi- 
dent of the Cincinnati Schoolmasters Club. 


At Harvard University Dr. Herbert Sidney Langfeld has been 
appointed acting director of the psychologcal laboratory. 


At the University of Michigan Associate Professor J. F. Shepard 
has been made professor of psychology, and Dr. H. Foster Adams 
rw been advanced from instructor to assistant professor of psy- 
chology. 
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